Search CORE

7 research outputs found

Interactive Teaching Algorithms for Inverse Reinforcement Learning

Author: Cevher Volkan
Devidze Rati
Kamalaruban Parameswaran
Singla Adish
Publication venue
Publication date: 01/01/2019
Field of study

We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the next demonstration based on learner's current policy. In particular, we design teaching algorithms for two concrete settings: an omniscient setting where a teacher has full knowledge about the learner's dynamics and a blackbox setting where the teacher has minimal knowledge. Then, we study a sequential variant of the popular MCE-IRL learner and prove convergence guarantees of our teaching algorithm in the omniscient setting. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher.Comment: IJCAI'19 paper (extended version

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

MPG.PuRe

Learning to Play Text-based Adventure Games with Maximum Entropy Reinforcement Learning

Author: Devidze Rati
Fellenz Sophie
Li Weichen
Publication venue
Publication date: 27/06/2023
Field of study

Text-based games are a popular testbed for language-based reinforcement learning (RL). In previous work, deep Q-learning is commonly used as the learning agent. Q-learning algorithms are challenging to apply to complex real-world domains due to, for example, their instability in training. Therefore, in this paper, we adapt the soft-actor-critic (SAC) algorithm to the text-based environment. To deal with sparse extrinsic rewards from the environment, we combine it with a potential-based reward shaping technique to provide more informative (dense) reward signals to the RL agent. We apply our method to play difficult text-based games. The SAC method achieves higher scores than the Q-learning methods on many games with only half the number of training steps. This shows that it is well-suited for text-based games. Moreover, we show that the reward shaping technique helps the agent to learn the policy faster and achieve higher scores. In particular, we consider a dynamically learned value function as a potential function for shaping the learner's original sparse reward signals

arXiv.org e-Print Archive

Interactive Teaching Algorithms for Inverse Reinforcement Learning

Author: Cevher Volkan
Devidze Rati
Parameswaran Kamalaruban
Singla Adish
Publication venue
Publication date: 15/10/2019
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Interactive Teaching Algorithms for Inverse Reinforcement Learning

Author: Adish Singla
Cevher Volkan
Parameswaran Kamalaruban
Rati Devidze
Publication venue
Publication date: 10/10/2019
Field of study

Infoscience - École polytechnique fédérale de Lausanne